Implementing a high performance tensor library

نویسنده

Walter Landry

چکیده

Template methods have opened up a new way of building C++ libraries. These methods allow the libraries to combine the seemingly contradictory qualities of ease of use and uncompromising e ciency. However, libraries that use these methods are notoriously di cult to develop. This article examines the bene ts reaped and the di culties encountered in using these methods to create a friendly, high performance, tensor library. We nd that template methods mostly deliver on this promise, though requiring moderate compromises in either usability or e ciency.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

cuTT: A High-Performance Tensor Transpose Library for CUDA Compatible GPUs

We introduce the CUDA Tensor Transpose (cuTT) library that implements high-performance tensor transposes for NVIDIA GPUs with Kepler and above architectures. cuTT achieves high performance by (a) utilizing two GPU-optimized transpose algorithms that both use a shared memory buffer in order to reduce global memory access scatter, and by (b) computing memory positions of tensor elements using a t...

متن کامل

New implementation of high-level correlated methods using a general block tensor library for high-performance electronic structure calculations

This article presents an open-source object-oriented C++ library of classes and routines to perform tensor algebra.The primary purpose of the library is to enable post-Hartree–Fock electronic structure methods; however, the code is general enough to be applicable in other areas of physical and computational sciences. The library supports tensors of arbitrary order (dimensionality), size, and sy...

متن کامل

Assessment of the Log-Euclidean Metric Performance in Diffusion Tensor Image Segmentation

Introduction: Appropriate definition of the distance measure between diffusion tensors has a deep impact on Diffusion Tensor Image (DTI) segmentation results. The geodesic metric is the best distance measure since it yields high-quality segmentation results. However, the important problem with the geodesic metric is a high computational cost of the algorithms based on it. The main goal of this ...

متن کامل

High Performance Rearrangement and Multiplication Routines for Sparse Tensor Arithmetic

Researchers from diverse disciplines are increasingly incorporating numeric highorder data, i.e., numeric tensors, within their practice. Just like the matrix-vector (MV) paradigm, the development of multi-purpose, but high-performance, sparse data structures and algorithms for arithmetic calculations, e.g., those found in Einstein-like notation, is crucial for the continued adoption of tensors...

متن کامل

Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions

Empirical optimizers like ATLAS have been very effective in optimizing computational kernels in libraries. The best choice of parameters such as tile size and degree of loop unrolling is determined in ATLAS by executing different versions of the computation. In contrast, optimizing compilers use a model-driven approach to program transformation. While the model-driven approach of optimizing com...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Scientific Programming

دوره 11 شماره

صفحات -

تاریخ انتشار 2003

Implementing a high performance tensor library

نویسنده

چکیده

منابع مشابه

cuTT: A High-Performance Tensor Transpose Library for CUDA Compatible GPUs

New implementation of high-level correlated methods using a general block tensor library for high-performance electronic structure calculations

Assessment of the Log-Euclidean Metric Performance in Diffusion Tensor Image Segmentation

High Performance Rearrangement and Multiplication Routines for Sparse Tensor Arithmetic

Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions

عنوان ژورنال:

اشتراک گذاری